Goal
- Leave confident you can use ggplot
Agenda
- Intro and Motivation
- Grammar
- Toolkit
- Facets, Scales, Themes
- Full Example
August 17, 2017
R is a language for statistical computing and data science
ggplot2 is an R package for data visualization
mpg
| manufacturer | model | displ | hwy | cyl | class |
|---|---|---|---|---|---|
| audi | a4 | 1.8 | 29 | 4 | compact |
| audi | a4 | 1.8 | 29 | 4 | compact |
| dodge | caravan 2wd | 2.4 | 24 | 4 | minivan |
| dodge | caravan 2wd | 3.0 | 24 | 6 | minivan |
ggplot(mpg, aes(x = displ, y = hwy)) + geom_point()
ggplot(mpg, aes(x = displ, y = hwy)) + geom_point()
ggplot()mpgaes(...)geom_point()ggplot(mpg, aes(x = displ, y = hwy)) + geom_point()
ggplot(mpg, aes(x = displ, y = hwy)) + geom_point(aes(color = class))
ggplot(mpg, aes(x = displ, y = hwy)) + geom_point(color = "darkgreen")
ggplot(mpg, aes(x = displ, y = hwy)) + geom_point(aes(color = class))
ggplot(mpg, aes(x = displ, y = hwy)) + geom_point(color = "darkgreen")
... + geom_smooth(method = "lm")
ggplot(mpg, aes(x = displ, y = hwy)) + geom_point(aes(color = class)) + geom_smooth(method = "lm", se = FALSE)
ggplot(mpg, aes(x = displ, y = hwy, color = class)) + geom_point() + geom_smooth(method = "lm")
ggplot(mpg, aes(x = displ, y = hwy, color = class)) + geom_point() + geom_smooth(method = "lm", se = FALSE)
ggplot(mpg, aes(x = displ, y = hwy)) + geom_point(aes(color = class)) + geom_smooth(method = "lm", se = FALSE)
mpg)aes())geom_)stat = "count")ggplot(mpg, aes(x = class)) + geom_bar(stat = "count")
Plots based on data.frame
Map data columns to aesthetic properties
Build plots iteratively
ggplot(mpg, aes(x = class, y = hwy)) + geom_boxplot()
ggplot(mpg, aes(x = class, y = hwy)) + geom_boxplot() + geom_point()
ggplot(mpg, aes(x = class, y = hwy)) + geom_boxplot() + geom_jitter()
ggplot(mpg, aes(x = hwy)) + geom_histogram()
ggplot(economics, aes(x = date, y = psavert)) + geom_line()
... + geom_area()
... + geom_ribbon(aes(ymin = psavert - 0.5, ymax = psavert + 0.5))
... + geom_raster(aes(fill = density))
... + geom_contour(aes(z = density))
help(geom_boxplot)
geom_boxplot understands the following aesthetics (required aesthetics are in bold):
ggplot(mpg, aes(x = displ, y = hwy)) + geom_point()
... + facet_wrap(~ cyl)
... + facet_wrap(~ cyl, scales = "free")
... + facet_wrap(~ cyl, scales = "free") + geom_smooth(method = "lm")
A scale controls how the data appears on the plot
ggplot(mpg, aes(x = displ, y = hwy, color = factor(cyl))) + geom_point()
... + scale_x_continuous("Engine Displacement")
... + scale_y_continuous("Highway MPG", limits = c(0, 50),
breaks = seq(0, 50, by = 5))
... + scale_color_hue("Cylinders", l = 80, c = 60)
| Position | Color | Shape |
|---|---|---|
| scale_x_continuous | scale_color_gradient | scale_shape_discrete |
| scale_x_discrete | scale_color_discrete | scale_shape_manual |
| scale_x_date | scale_color_discrete | |
| scale_x_datetime | ||
| scale_x_sqrt | ||
| scale_x_log10 |
A theme controls finer elements of plot appearance
... + theme(axis.text.x = element_text(size = 16 ))
... + theme(axis.title.y = element_text(angle = 0)
| Plot | Axis | Legend | Panel | Faceting |
|---|---|---|---|---|
| plot.title | axis.line | legend.key | panel.border | strip.text |
| plot.margin | axis.text | legend.text | aspect.ratio | panel.margin |
... + theme_bw()
economics
| date | psavert |
|---|---|
| 1967-07-01 | 12.5 |
| 1967-08-01 | 12.5 |
| 1967-09-01 | 11.7 |
| 1967-10-01 | 12.5 |
| 1967-11-01 | 12.5 |
| 1967-12-01 | 12.1 |
ggplot(economics) + geom_line(aes(x = date, y = psavert), color = "orange")
... + geom_ribbon(aes(x = date, ymin = psavert - 1, ymax = psavert + 1))
pres2
| name | start | end | party |
|---|---|---|---|
| Reagan | 1981-01-20 | 1989-01-20 | Republican |
| Bush | 1989-01-20 | 1993-01-20 | Republican |
| Clinton | 1993-01-20 | 2001-01-20 | Democratic |
| Bush | 2001-01-20 | 2009-01-20 | Republican |
| Obama | 2009-01-20 | 2017-01-20 | Democratic |
... + geom_rect(aes(fill = party), data = pres2)
... + scale_fill_manual(values = c("blue", "red"))
... + scale_x_date(date_breaks = "4 years", date_labels = "%Y")
... + geom_vline(aes(xintercept = start), data = pres2)
... + geom_text(aes(x = start, label = name), data = pres2)
... + labs(x = NULL, y = "Rate", title = "Personal Savings Rate")
... + theme_bw()
ggplot(economics, data = economics) + geom_ribbon() + geom_line() + geom_rect(data = pres2) + scale_fill_manual() + scale_x_date() + geom_vline() + geom_text() + labs() + theme_bw()
package for data visualization
grammar of graphics
based on data
build layer by layer
rapid experimentation
fine tune control
R for Data Science
R Graphics CookBook
ggplot book
Dave Childers
dachilde@cisco.com